pacman::p_load(sf,sfdep,tidyverse,tmap)Take-Home_Ex1
Analysis of Geospatial Distribution of Bus Ridership by Origin Bus Stop in Singapore
Objectives
The bus system is one of Singapore’s two pillars of public transport aside from the MRT. The bus system ensures convenient and affordable short-, medium-, and long-distance travel for riders. Thanks to the widespread availability of bus stops as compared to MRT stations, it has a high level of accessibility. However, this also leaves the system prone to under- or over-investment in terms of the number of bus routes, leading some stops and routes to be under-served or over-served.
The objective of this study is to examine the distribution of bus trips in Singapore by analyzing the number of trips by originating bus stops. It will consist of two levels of analysis:
- GeoVisualisation and Analysis: Visualizing the number of trips by originating bus stops and provide descriptive statistics of the distribution of trips by bus stops.
- Local Indicators of Spatial Association Analysis (LISA): This analysis involves the calculation of Local Moran’s I to determine local spatial autocorrelation between a bus stop and its neighbors. Additionally, visualizations such as a LISA cluster map will be created for easier comparison.
Getting Started
First, the necessary R packages will be loaded using the p_load() function of the pacman package. p_load() will also install any package which is not already installed. The following packages will be loaded:
sf: For handling of geospatial data.
sfdep: For determining the spatial dependence of spatial features. The three main categories of functionality relates to the determination of geometry neighbors, weights, and LISA.
tidyverse: For manipulation of non-spatial data. This package contains ggplot2 for plotting, dplyr and tidyr for dataframe manipulation, and readr for reading comma-separated values (CSV).
tmap: For thematic mapping, especially the mapping of simple features data frame.
Importing Required Data
For the purpose of this study, two types of data will be used: geospatial data which consists of spatial features and their coordinates information, and aspatial data which consists of attributes which can be ascribed to the geospatial data. Specifically, the following datasets will be used for each type:
- Geospatial Data:
- BusStop.shp: This shape file contains the location of the bus stops in Singapore as at July 2023. This file can be retrieved from the Land Transport Authority (LTA) Data Mall (link).
- Aspatial Data:
- origin_destination_bus_202309.csv: This CSV file contains the detail of bus trips from an originating bus stop to a destination bus stop, identified by their unique codes, each hour of the day during September 2023. The data is further broken down into weekend or weekday, but not by the specific day of the week. This data can be retrieved by using the LTA Data Mall’s API (link).
The first steps taken will be to import these files into the R environment in a manipulable format.
Importing Geospatial Data
Geospatial data can be imported using the st_read() function of the sf package. This will import the file into the R environment as a sf (simple features) data frame. st_transform() is added to transform the Coordinate Reference System (CRS) to EPSG: 3414, which is the CRS of Singapore.
In st_read():
dsn: the directory where the shape file is stored
layer: the name of the shape file
busstop <- st_read(dsn = 'data/geospatial',
layer = 'BusStop') %>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`D:\phlong2023\ISSS624\Take-Home_Ex\Take-Home_Ex1\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
From the message provided by R, it can be seen that the busstop sf data frame has 5161 rows, 3 columns, and has a CRS of SVY 21.
To get a better grasp of the busstop data frame, glimpse() function can be used.
The data type for each column can be seen as well as some of their values. For sf data frames, there is a geometry column (POINT type) which contains the location information for each polygon.
glimpse(busstop)Rows: 5,161
Columns: 4
$ BUS_STOP_N <chr> "22069", "32071", "44331", "96081", "11561", "66191", "2338…
$ BUS_ROOF_N <chr> "B06", "B23", "B01", "B05", "B05", "B03", "B02A", "B02", "B…
$ LOC_DESC <chr> "OPP CEVA LOGISTICS", "AFT TRACK 13", "BLK 239", "GRACE IND…
$ geometry <POINT [m]> POINT (13576.31 32883.65), POINT (13228.59 44206.38),…
Importing Aspatial Data
The read_csv() function of readr can be used to import the origin_destination_bus_202309 CSV file into the R environment as a data frame.
passenger <- read_csv('data/aspatial/origin_destination_bus_202309.csv')From the message provided by R, it can be seen that the passenger has 5,714,196 rows and 7 columns.
head() can be used instead of glimpse() to view the top five rows of the passenger data frame. This will also allow us to see the data type of each of the column.
head(passenger)# A tibble: 6 × 7
YEAR_MONTH DAY_TYPE TIME_PER_HOUR PT_TYPE ORIGIN_PT_CODE DESTINATION_PT_CODE
<chr> <chr> <dbl> <chr> <chr> <chr>
1 2023-09 WEEKENDS/… 17 BUS 24499 22221
2 2023-09 WEEKENDS/… 10 BUS 65239 65159
3 2023-09 WEEKDAY 10 BUS 65239 65159
4 2023-09 WEEKDAY 7 BUS 23519 23311
5 2023-09 WEEKENDS/… 7 BUS 23519 23311
6 2023-09 WEEKENDS/… 11 BUS 52509 42041
# ℹ 1 more variable: TOTAL_TRIPS <dbl>
Note that the ORIGIN_PT_CODE and DESTINATION_PT_CODE are in the character (“chr”) data type. However, we would like it to be in the factor (“fctr”) data type for easier categorization and sorting. This can be done by using the as.factor() function.
passenger$ORIGIN_PT_CODE <- as.factor(passenger$ORIGIN_PT_CODE)
passenger$DESTINATION_PT_CODE <- as.factor(passenger$DESTINATION_PT_CODE)We can use head() to check the data type of the passenger data frame.
head(passenger)# A tibble: 6 × 7
YEAR_MONTH DAY_TYPE TIME_PER_HOUR PT_TYPE ORIGIN_PT_CODE DESTINATION_PT_CODE
<chr> <chr> <dbl> <chr> <fct> <fct>
1 2023-09 WEEKENDS/… 17 BUS 24499 22221
2 2023-09 WEEKENDS/… 10 BUS 65239 65159
3 2023-09 WEEKDAY 10 BUS 65239 65159
4 2023-09 WEEKDAY 7 BUS 23519 23311
5 2023-09 WEEKENDS/… 7 BUS 23519 23311
6 2023-09 WEEKENDS/… 11 BUS 52509 42041
# ℹ 1 more variable: TOTAL_TRIPS <dbl>
Data Preparation
In order to perform our analysis, certain manipulations must be made in order to prepare the data. Specifically, the passenger data set will be filtered and summarzied. Subsequently, it will be combined with the busstop data set based on the bus stop code variable present in both data frames.
Wrangling Aspatial Data
Filtering the passenger Data Set for Desired Time Frames
For the purpose of this study, the passenger data set needs to be filtered to only contain trips falling within one of the following time frames:
| Peak hour period | Bus tap on time |
|---|---|
| Weekday morning peak | 6am to 9am |
| Weekday afternoon peak | 5pm to 8pm |
| Weekend/holiday morning peak | 11am to 2pm |
| Weekend/holiday evening peak | 4pm to 7pm |
This can be accomplished using the filter() function and the dplyr steps. We can create four separate data frames to store the four different time frames
# Weekday morning peak 6am - 9am
passenger_wd_69 <- passenger %>%
filter(DAY_TYPE == 'WEEKDAY') %>%
filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9)
# Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
passenger_wd_1720 <- passenger %>%
filter(DAY_TYPE == 'WEEKDAY') %>%
filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20)
# Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
passenger_weh_1114 <- passenger %>%
filter(DAY_TYPE == 'WEEKENDS/HOLIDAY') %>%
filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14)
# Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
passenger_weh_1619 <- passenger %>%
filter(DAY_TYPE == 'WEEKENDS/HOLIDAY') %>%
filter(TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 19)After the different trips have been categorized into their separate data frames, the total number trips for each origin bus stop can be tallied into a single statistic for the study period. This can be accomplished using the summarize() function. The example below shows this operation using passenger_wd_69.
The group_by() function is used to instruct R to conduct operations based on the groups created by group_by(). In this case, the summary operations will be done based on the origin bus stop codes.
# Tallying the trips by origin bus stop for Weekday morning peak 6am - 9am
passenger_wd_69_tallied <- passenger_wd_69 %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
passenger_wd_69_tallied# A tibble: 5,020 × 2
ORIGIN_PT_CODE TRIPS
<fct> <dbl>
1 01012 1640
2 01013 764
3 01019 1322
4 01029 2373
5 01039 2562
6 01059 1582
7 01109 144
8 01112 7993
9 01113 6734
10 01119 3736
# ℹ 5,010 more rows
As can be seen, the newly created data frame consists only of the total trip numbers for each origin bus stop. This can be repeated for the other time frames.
# Tallying the trips by origin bus stop for Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
passenger_wd_1720_tallied <- passenger_wd_1720 %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
# Tallying the trips by origin bus stop for Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
passenger_weh_1114_tallied <- passenger_weh_1114 %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
# Tallying the trips by origin bus stop for Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
passenger_weh_1619_tallied <- passenger_weh_1619 %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))Wrangling Geospatial Data
In order to adequately visualize the busstop sf data frame, we need to define a mapping layer. An example of a mapping layer would be to use the Master Plan 2019 Planning Sub-zone created by the Urban Redevelopment Authority (URA). However, for the purpose of this study, a hexagon layer will be used to ensure standardization of the size of each polygon and the evenly spaced gaps between a polygon and its neighbors.
The steps in this section will detail the creation of the hexagon layer using the busstop data frame and visualize the layer on a map of Singapore.
Creating a Hexagon Layer in R
The steps taken in this section is based on the guide provided by Kenneth Wong of Urban Data Palette (link).
Firstly, a hexagon or honeycomb grid can be created based on the busstop data frame using the st_make_grid() function.
There are some notable arguments in the st_make_grid() function:
cellsize = c(100,100): This argument indicates the size of each hexagon. If the cell size is large, each hexagon can encompasses multiple bus stops, whereas if a smaller cell size can help us differentiate between individual bus stop. However, a smaller cell size with many hexagons will take more time to create.
what = ‘polygons’: We would like to create polygons on a grid.
square = FALSE: The default argument is TRUE, which would create a square grid. FALSE is specified in order to create a hexagon grid.
area_honeycomb_grid = st_make_grid(busstop, cellsize = c(500,500), what = 'polygons', square = FALSE)
area_honeycomb_gridGeometry set for 5580 features
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3470.122 ymin: 26193.43 xmax: 48720.12 ymax: 53184.55
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
The area_honeycomb_grid contains 136906 features of the same Projected CRS as the busstop data frame. If the plot() function is used, the hexagon grid will be displayed. However, this grid contains no information and might be too small to discern the individual cell.
#qtm(area_honeycomb_grid)The area_honey_comb needs to be converted to a sf data frame for further manipulation using st_sf(). Additionally, we can assign a unique id to each of the hexagon cell in area_honey_comb using mutate().
honeycomb_grid_sf = st_sf(area_honeycomb_grid) %>%
# add grid ID
mutate(grid_id = 1:length(lengths(area_honeycomb_grid)))Following this, we can use lengths() and st_intersect() to determine the allocation of bus stop in each cell. The goal is to create a new column, consisting of the number of bus stop in each of the cell. The filter() function can then be added to remove all cells with no bus stop and create the final sf data frame.
# Counting the number of bus stop in each cell
honeycomb_grid_sf$n_busstop = lengths(st_intersects(honeycomb_grid_sf,busstop))
# Removing all cells without bus stop
honeycomb_count = filter(honeycomb_grid_sf, n_busstop > 0)At this point, the hexagon grid of bus stop can be drawn onto a map of Singapore using the functions of the tmap package. Additionally, the n_busstop column can be passed to the tm_fill() function to shade the cell based on the number of bus stops in it.
Several functions are added to make the map interactive and aesthetically pleasing
tmap_mode(‘view’): Creates an interactive map which allow zooming and interacting with cells on the map
pop.vars: Identifying the legend and value which pops up when a cell is selected. In this case, it is the number of bus stops.
popup.format: Specifying the format of the variable to be displayed when selecting a cell.
tm_basemap: Choosing the basemap layer on which the hexagon grid will be drawn. OpenStreetMap is chosen due to its high fidelity while not being overly crowded. Additionally, OpenStreetMap displays icon for bus stops in Singapore, allowing user to visually check any cell.
- If an incorrect CRS was specified in the earlier steps, the basemap will be of an incorrect location or alignment.
tmap_mode('view')
bushexmap <- tm_shape(honeycomb_count)+
tm_fill(
col = "n_busstop",
palette = "Blues",
style = "cont",
title = "Number of bus stop",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of bus stop: " = "n_busstop"
),
popup.format = list(
n_busstop = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))
bushexmapSearch for the bus stop near your place and compare the numbers between the four maps by zooming in! Do you think it’s accurate?
By looking at the illustration, we can see that each cell might contain up to four bus stops. A bar chat can be drawn with the ggplot2 package to visualize the distribution of number of bus stop in each cell.
ggplot(honeycomb_count, aes(x=n_busstop))+
geom_bar()+
theme_classic()+
geom_text(aes(label = ..count..), stat = "count", vjust = -0.5, colour = "black")
As can be seen, the majority of cells contain 1-2 bus stop with only 250 cells containing more than 3 bus stops. This shows that the cells adequately capture solitary bus stop, as well as pairs of bus stops (bus stops which are opposite each other, served by the same bus services).
Combining Aspatial and Geospatial Data
In order to conduct geospatial analysis, a data frame which contains the hexagon cells as well as the number of bus trips for each cells must be created. This can be done using the left_join argument.
There are important arguments which can be used to create a cleaner combined data frame.
by = join_by(BUS_STOP_N == ORIGIN_PT_CODE)): Indicate the column by which the two data frames can be matched and joined. In this case, the bus stop code will be used.
select(1,4,5): Indicate the index number of the columns to be kept in the final data frame. Only the bus stop number (column 1), total number of trips (column 4), and geometry (column 5) will be kept.
replace(is.na(.),0): Replace all value of NA with 0. This is to ensure that bus stop with no trips in a given time frame is accurately tallied at 0.
# Weekday morning peak 6am - 9am
passenger_wd_69_combined <- left_join(busstop, passenger_wd_69_tallied, by = join_by(BUS_STOP_N == ORIGIN_PT_CODE))%>%
select(1,4,5)%>%
replace(is.na(.),0)
# Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
passenger_wd_1720_combined <- left_join(busstop, passenger_wd_1720_tallied, by = join_by(BUS_STOP_N == ORIGIN_PT_CODE))%>%
select(1,4,5)%>%
replace(is.na(.),0)
# Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
passenger_weh_1114_combined <- left_join(busstop, passenger_weh_1114_tallied, by = join_by(BUS_STOP_N == ORIGIN_PT_CODE))%>%
select(1,4,5)%>%
replace(is.na(.),0)
# Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
passenger_weh_1619_combined <- left_join(busstop, passenger_weh_1619_tallied, by = join_by(BUS_STOP_N == ORIGIN_PT_CODE))%>%
select(1,4,5)%>%
replace(is.na(.),0)It is important to note that the bus stops and their total trips have not been tallied into the hexagon cells. st_join() can be used to accomplish this for each time frame.
The by argument in st_join() can be passed the function st_within to specify that we would like to join the two data frames where the geometry in the latter is within the geometry of the former. In this case, it would mean that two rows will be joined where the bus stop lies within a particular polygon.
- The group_by() and summarise() functions here are used similarly to before, they sums up the total number of trips for all the bus stops in the hexagon, based on its grid_id, and create a new column called TOTAL_TRIP.
# Weekday morning peak 6am - 9am
hex_passenger_wd_69 <- st_join(honeycomb_count, passenger_wd_69_combined, by = st_within(sparse = FALSE), largest = TRUE) %>%
group_by(grid_id)%>%
summarise(TOTAL_TRIP = sum(TRIPS))
# Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
hex_passenger_wd_1720 <- st_join(honeycomb_count, passenger_wd_1720_combined, by = st_within(sparse = FALSE), largest = TRUE) %>%
group_by(grid_id)%>%
summarise(TOTAL_TRIP = sum(TRIPS))
# Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
hex_passenger_weh_1114 <- st_join(honeycomb_count, passenger_weh_1114_combined, by = st_within(sparse = FALSE), largest = TRUE) %>%
group_by(grid_id)%>%
summarise(TOTAL_TRIP = sum(TRIPS))
# Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
hex_passenger_weh_1619 <- st_join(honeycomb_count, passenger_weh_1619_combined, by = st_within(sparse = FALSE), largest = TRUE) %>%
group_by(grid_id)%>%
summarise(TOTAL_TRIP = sum(TRIPS))In sum, the analysis will revolve around the four following data frames which contain the spatial information of the hexagon as well as the total number of trips for each interested time frame:
- hex_passenger_wd_69: Weekday morning peak 6am - 9am
- hex_passenger_wd_1720: Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
- hex_passenger_weh_1114: Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
- hex_passenger_weh_1619: Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
Cleanup Step
Before we move on to Geovisualization and the analysis of LISA, it would be wise to remove objects which will no longer be used. This will help to free up memory for other tasks. rm() can be used to perform this task
rm(list = c('area_honeycomb_grid', 'bushexmap', 'honeycomb_count', 'honeycomb_grid_sf', 'passenger_wd_1720', 'passenger_wd_1720_tallied', 'passenger_wd_1720_combined', 'passenger_wd_69', 'passenger_wd_69_tallied', 'passenger_wd_69_combined', 'passenger_weh_1114', 'passenger_weh_1114_tallied', 'passenger_weh_1114_combined', 'passenger_weh_1619', 'passenger_weh_1619_tallied', 'passenger_weh_1619_combined'))Geovisualization and Analysis
The beginning step of the analysis would be to visualize the distribution of bus trips on the hexagon layer. This can be accomplished with the mapping functions of the tmap package. However, unlike the geovisualization of bus stop per hexagon, the number of trips for each hexagon depending on the time frame varies widely. Let’s confirm this by drawing a histogram of the distribution of trips in each time frame.
weekday_morning_hist <- ggplot(hex_passenger_wd_69, aes(x=TOTAL_TRIP))+
geom_histogram()+
scale_x_continuous(labels = scales::comma)+
ggtitle('Weekday Morning')+
theme_classic()
weekday_evening_hist <- ggplot(hex_passenger_wd_1720, aes(x=TOTAL_TRIP))+
geom_histogram()+
scale_x_continuous(labels = scales::comma)+
ggtitle('Weekday Evening')+
theme_classic()
weekend_noon_hist <- ggplot(hex_passenger_weh_1114, aes(x=TOTAL_TRIP))+
geom_histogram()+
scale_x_continuous(labels = scales::comma)+
ggtitle('Weekend Noon')+
theme_classic()
weekend_evening_hist <- ggplot(hex_passenger_weh_1619, aes(x=TOTAL_TRIP))+
geom_histogram()+
scale_x_continuous(labels = scales::comma)+
ggtitle('Weekend Evening')+
theme_classic()
gridExtra::grid.arrange(weekday_morning_hist, weekday_evening_hist, weekend_noon_hist, weekend_evening_hist,
nrow = 2, ncol = 2)
tmap_mode('view')
### Weekday morning peak 6am - 9am
# Quantile style
weekday_morning_quantile <- tm_shape(hex_passenger_wd_69) +
tm_fill(
col = "TOTAL_TRIP",
palette = "Blues",
style = "quantile",
title = "Number of Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of Trips: " = "TOTAL_TRIP"
),
popup.format = list(
TOTAL_TRIP = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))+
tm_layout(title = 'Weekday Morning Peak (Quantile)', title.position = c('right', 'top'))
# Pretty style
weekday_morning_pretty <- tm_shape(hex_passenger_wd_69) +
tm_fill(
col = "TOTAL_TRIP",
palette = "Blues",
style = "pretty",
title = "Number of Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of Trips: " = "TOTAL_TRIP"
),
popup.format = list(
TOTAL_TRIP = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))+
tm_layout(title = 'Weekday Morning Peak (Jenk)', title.position = c('right', 'top'))
# Equal style
weekday_morning_equal <- tm_shape(hex_passenger_wd_69) +
tm_fill(
col = "TOTAL_TRIP",
palette = "Blues",
style = "equal",
title = "Number of Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of Trips: " = "TOTAL_TRIP"
),
popup.format = list(
TOTAL_TRIP = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))+
tm_layout(title = 'Weekday Morning Peak (Equal)', title.position = c('right', 'top'))
tmap_arrange(weekday_morning_quantile, weekday_morning_pretty, weekday_morning_equal, asp = 1, ncol = 3)As can be seen, only the ‘quantile’ style can display the difference in distribution of trips between the different hexagons. This can be attributed to the fact that the range of trips between the hexagons are too large. Therefore, it is possible to move forward using the ‘quantile’ style for visualization.
It is good to remove the 3 plots created to plot the different styles since they will not be used and will only take up memory.
rm(list=c('weekday_morning_equal','weekday_morning_quantile','weekday_morning_pretty'))tmap_mode('view')
# Weekday morning peak 6am - 9am
weekday_morning <- tm_shape(hex_passenger_wd_69) +
tm_fill(
col = "TOTAL_TRIP",
palette = "Blues",
style = "quantile",
title = "Number of Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of Trips: " = "TOTAL_TRIP"
),
popup.format = list(
TOTAL_TRIP = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))+
tm_layout(title = 'Weekday Morning Peak', title.position = c('right', 'top'))
# Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
weekday_afternoon <- tm_shape(hex_passenger_wd_1720) +
tm_fill(
col = "TOTAL_TRIP",
palette = "Blues",
style = "quantile",
title = "Number of Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of Trips: " = "TOTAL_TRIP"
),
popup.format = list(
TOTAL_TRIP = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))+
tm_layout(title = 'Weekday Afternoon Peak', title.position = c('right', 'top'))
# Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
weekend_noon <- tm_shape(hex_passenger_weh_1114) +
tm_fill(
col = "TOTAL_TRIP",
palette = "Blues",
style = "quantile",
title = "Number of Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of Trips: " = "TOTAL_TRIP"
),
popup.format = list(
TOTAL_TRIP = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))+
tm_layout(title = 'Weekend/holiday Morning Peak', title.position = c('right', 'top'))
# Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
weekend_evening <- tm_shape(hex_passenger_weh_1619) +
tm_fill(
col = "TOTAL_TRIP",
palette = "Blues",
style = "quantile",
title = "Number of Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of Trips: " = "TOTAL_TRIP"
),
popup.format = list(
TOTAL_TRIP = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))+
tm_layout(title = 'Weekend/holiday Evening Peak', title.position = c('right', 'top'))
tmap_arrange(weekday_morning, weekday_afternoon, weekend_noon, weekend_evening, nrow = 2, ncol = 2)Search for the bus stop near your place and compare the numbers between the four maps by zooming in! Do you think it’s accurate?
From the rough visualization, it’s clear that not all bus stops experience a similar level of traffic throughout different timing of the days. Additionally, based on the quantiles created by tmap, it seems that the ranges of passenger traffic are radically different between the four time windows. For example, certain grids during Weekday Morning Peak, a hexagon could reach 328,545 passenger trips, whereas the highest number during Weekend/holiday Morning Peak only reaches 112,330.
Local Indicators of Spatial Autocorrelation (LISA)
Defining the Neighborhood
Before the LISA analysis can be conducted, it is important to define our neighborhood, or the neighbors of each polygon. This is based on the ideas that neighbors, or spatial objects which are related to other spatial objects based on sharing a common boundary or lying with a certain distance of one another, might affect each other.
In this case, we would like to identify the neighbors of each hexagon so that LISA analysis can determine if the number of trips in a hexagon in a time frame is correlated to the number of trips of the hexagons around it, either in the same direction or opposite direction.
The first step to determining the neighborhood is to choose a method by which neighbors are classified:
- Contiguity-based Method: Based on the sharing of boundaries, either edges and/or points (Queen and Rook method).
- Distance-based Method: Based on the distances between the centroid (central point of each hexagon) of each polygon. This can either be set to a distance where each hexagon has at least one neighbor (Fixed Distance) or where each polygon has a certain number of neighbors (Adaptive Distance).
It is important to choose the appropriate method according to each situation. However, in this study, Contiguity-based methods can be preemptively ruled out due to the fact that some cells have no contiguous neighbors, as can be seen below.

If the contiguity method is used, the LISA calculation for these cells would not be conducive for analysis as they technically have no neighbors. This can be confirmed by using the st_contiguity() function to create a Queen contiguity matrix for one time frame
wm_q <- st_contiguity(st_geometry(hex_passenger_wd_1720))
summary(wm_q)Neighbour list object:
Number of regions: 1524
Number of nonzero links: 6880
Percentage nonzero weights: 0.2962228
Average number of links: 4.514436
10 regions with no links:
307 565 730 984 1051 1419 1509 1512 1516 1524
Link number distribution:
0 1 2 3 4 5 6
10 39 112 205 291 364 503
39 least connected regions:
1 7 22 38 98 169 187 195 211 218 258 259 264 267 287 457 566 611 646 700 712 736 755 788 873 1025 1026 1050 1090 1218 1468 1475 1486 1504 1505 1507 1510 1514 1523 with 1 link
503 most connected regions:
10 13 16 17 24 25 31 35 42 43 48 53 55 60 63 67 73 77 80 81 84 85 87 88 91 92 97 102 107 111 117 121 127 133 140 141 143 148 149 150 154 155 156 157 163 164 165 173 174 175 183 184 185 191 192 193 194 200 201 202 205 206 207 208 216 229 239 243 244 246 257 266 271 278 279 283 284 291 292 298 300 301 302 304 310 311 313 314 317 322 325 326 328 338 339 340 341 344 353 356 364 369 391 392 401 403 404 408 415 419 424 426 432 438 439 440 442 445 453 454 455 464 470 471 472 473 477 481 484 485 489 493 497 498 500 506 507 511 517 518 521 522 527 533 538 543 547 552 553 554 556 560 562 568 572 577 578 580 581 585 594 595 598 602 603 608 609 613 619 623 628 630 637 640 641 642 652 653 654 658 659 661 662 663 673 674 675 681 684 685 686 691 692 694 695 704 705 708 709 710 717 720 721 728 731 732 733 744 745 759 761 762 764 775 778 779 780 781 786 787 791 792 793 796 797 798 799 803 804 810 811 814 815 816 817 823 824 827 828 829 834 835 836 845 847 848 850 851 852 854 855 856 857 858 864 867 869 870 871 875 876 880 881 882 884 885 886 888 889 891 892 895 897 900 903 906 909 910 914 918 923 925 930 931 932 934 935 939 941 947 948 949 950 951 952 958 962 963 966 967 972 973 975 976 977 981 988 989 990 991 992 994 1000 1001 1002 1003 1008 1015 1016 1017 1018 1028 1029 1030 1032 1033 1040 1041 1042 1046 1054 1055 1058 1060 1061 1066 1067 1068 1070 1071 1072 1073 1080 1082 1083 1084 1087 1093 1097 1104 1105 1106 1109 1110 1114 1115 1121 1124 1125 1126 1132 1137 1138 1139 1140 1145 1146 1148 1149 1150 1151 1152 1154 1160 1161 1162 1166 1167 1168 1170 1173 1174 1175 1176 1180 1181 1182 1183 1184 1188 1190 1194 1195 1196 1197 1198 1205 1206 1207 1208 1209 1210 1211 1214 1215 1221 1222 1223 1224 1225 1231 1237 1238 1239 1243 1248 1249 1255 1257 1258 1259 1265 1269 1270 1275 1276 1277 1281 1285 1287 1293 1303 1305 1306 1307 1308 1320 1322 1328 1329 1330 1331 1333 1334 1335 1338 1339 1340 1341 1347 1348 1349 1356 1357 1359 1360 1365 1369 1370 1372 1373 1375 1376 1381 1384 1385 1386 1388 1392 1395 1397 1399 1402 1410 1412 1416 1421 1422 1424 1428 1429 1430 1431 1432 1437 1438 1439 1440 1444 1445 1446 1450 1451 1453 1455 1457 1460 1461 1463 1464 1465 1466 1473 with 6 links
As can be seen from the weight matrix, 10 hexagon cells have 0 neighbor. Therefore, a Distance-based Method would be suitable for analysis. Both Fixed Distance and Adaptive Distance methods will be utilized for more comprehensive comparison.
Fixed Distance Weight Matrix
Creating the Distance Matrix
st_dist_band() of sfdep is incredibly powerful in that it can create a neighbor list based on distance between the centroid of polygons and a lower and upper bound distance to other centroid. The default arguments for st_dist_band() will find neighbors with a lower and upper bound so that each hexagon will have at least one neighbor. This is the equivalent of the steps in spdep of the function knearneigh() of k=1.
st_inverse_distance() of a sfdep can be combined with st_dist_band() in a dplyr step to create a new column in each data frame of the different time frames to create a neighbor list and a inverse distance weight list.
# Weekday morning peak 6am - 9am
wd69_nb <- hex_passenger_wd_69 %>%
mutate(nb = st_dist_band(area_honeycomb_grid),
wt = st_inverse_distance(nb, area_honeycomb_grid),
lag_trip = st_lag(TOTAL_TRIP,nb,wt),
.before = 1) # to put them in the front
# Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
wd1720_nb <- hex_passenger_wd_1720 %>%
mutate(nb = st_dist_band(area_honeycomb_grid),
wt = st_inverse_distance(nb, area_honeycomb_grid),
lag_trip = st_lag(TOTAL_TRIP,nb,wt),
.before = 1) # to put them in the front
# Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
weh1114_nb <- hex_passenger_weh_1114 %>%
mutate(nb = st_dist_band(area_honeycomb_grid),
wt = st_inverse_distance(nb, area_honeycomb_grid),
lag_trip = st_lag(TOTAL_TRIP,nb,wt),
.before = 1) # to put them in the front
# Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
weh1619_nb <- hex_passenger_weh_1619 %>%
mutate(nb = st_dist_band(area_honeycomb_grid),
wt = st_inverse_distance(nb, area_honeycomb_grid),
lag_trip = st_lag(TOTAL_TRIP,nb,wt),
.before = 1) # to put them in the frontCalculating LISA
local_moran() of the sfdep package can be used to calculate Local Moran’s I Statistic and other related statistics.
# Weekday morning peak 6am - 9am
wd69_lisa <- wd69_nb %>%
mutate(local_moran = local_moran(TOTAL_TRIP, nb, wt, nsim = 199),
.before = 1) %>%
unnest(local_moran)
# Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
wd1720_lisa <- wd1720_nb %>%
mutate(local_moran = local_moran(TOTAL_TRIP, nb, wt, nsim = 199),
.before = 1) %>%
unnest(local_moran)
# Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
weh1114_lisa <- weh1114_nb %>%
mutate(local_moran = local_moran(TOTAL_TRIP, nb, wt, nsim = 199),
.before = 1) %>%
unnest(local_moran)
# Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
weh1619_lisa <- weh1619_nb %>%
mutate(local_moran = local_moran(TOTAL_TRIP, nb, wt, nsim = 199),
.before = 1) %>%
unnest(local_moran)tmap_mode('view')
tm_shape(wd1720_lisa)+
tm_fill(col = 'mean',
palette = "RdBu",
style = "cat",
title = "ii"
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))